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(54) Method and system for maintaining a global name space 



(57) A global mount mechanism capable of main- 
taining a consistent global name space in a distributed 
computing system including a plurality of nodes inter- 
connected by a communications link is herein disclosed. 
The global mount mechanism mounts a new file system 
resource into the global name space in a coherent man- 
ner such that the new file system resource is mounted 
at the same mount point concurrently in each node. The 



global mount mechanism accommodates mount or un- 
mount requests initiated from a requesting node for a 
resource located in a remote node. The global mount 
mechanism is also used to unmount a file system re- 
source from the global name space. The global mount 
mechanism also includes an initialization procedure that 
is used to generate the global name space initially by 
providing each local mount point with a global locking 
capability. 
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Description 

The present invention relates generally to a distrib- 
uted file system and particularly to a method and system 
for maintaining a global name space in a distributed file 
system. 

BACKGROUND OF THE INVENTION 

A cluster is a group of independent computing 
nodes connected by a high-speed communications link. 
Each computing node has one or more processes 
where each process has its own address space. Each 
process can access data that is associated with a file 
system that exists in the cluster. The file system can be 
resident in the node associated with the process or in 
another node within the cluster. 

The cluster has a global name space which repre- 
sents the file systems accessible to each node within 
the cluster. Each node may also have a local name 
space representing the file systems accessible to proc- 
esses associated with a particular node. A user associ- 
ated with a particular node can mount or connect a file 
system local to one node into the global name space. 
Furthermore, a user can unmount or disconnect a file 
system from the global name space thereby making the 
file system inaccessible to each node in the cluster. 

It is beneficial for each node to have a single system 
image of the global name space. However, maintaining 
this image is complicated by issues of coherency, re- 
source location, and transparency. Coherency must be 
achieved in mounting and unmounting a file system at 
the same mount point within the cluster and at the same 
point in time. Otherwise, each node can mount a file sys- 
tem at a different mount point or access an unmounted 
file. 

From the view point of users issuing mount and un- 
mount commands, the existence of the global name 
space should be as transparent as possible. This trans- 
parency will minimize the changes required to the inter- 
face of the mount and unmount command as well as to 
user application programs and data. 

Furthermore, in some instances the resources 
needed to mount a file system are not always accessible 
from all nodes in the cluster. This can affect the mount 
of a file system initiated from one node when the re- 
sources associated with the file system are best ac- 
cessed from another node. In order to perform the 
mount task, it becomes necessary to overcome this ob- 
stacle. 

Accordingly, there exists a need to maintain a global 
name space in a distributed computing environment in 
a manner that accounts for the aforementioned con- 
straints. 

SUMMARY OF THE INVENTION 

Particular and preferred aspects of the invention are 



set out in the accompanying independent and depend- 
ent claims. Features of the dependent claims may be 
combined with those of the independent claims as ap- 
propriate and in combinations other than those explicitly 

5 set out in the claims. 

An embodiment of the present invention provides a 
global mount mechanism capable of maintaining a con- 
sistent global name space in a distributed computing 
system. The distributed computing system includes a 

10 cluster of nodes interconnected by a communications 
link. The global mount mechanism mounts a new file 
system resource into the global name space and un- 
mounts a mounted file system resource in a coherent 
manner. Coherency is achieved by mounting the file 

is system resource at the same mount point within the 
cluster and at the same point in time. The global mount 
mechanism utilizes a distributed locking mechanism to 
ensure that the mount or unmount operation is per- 
formed in a coherent manner. The global mount mech- 
20 anism accounts for the disparity in file system resource 
distribution by allowing a file system resource to be 
mounted by a node not associated with the file system 
resource . 

The global name space is a collection of file system 

25 resources that are accessible from each node in the 
cluster. Each file system resource mediates access to 
a set of file resources belonging to its associated file 
system resource. Each file resource is represented by 
a pathname that can include one or more directories. 

30 Each directory in the global name space can serve as 
a global mount point at which a new file system resource 
can be mounted or incorporated into the global name 
space. When a new file system resource is mounted at 
a particular mount point, the file resources it mediates 

35 become accessible through pathnames that start with 
the mount point's pathname. 

A server node that is associated with a file system 
resource includes a virtual file system (VFS) mechanism 
and a file system object (FSobj) to represent the file sys- 

40 tern resource. In addition, each client node includes a 
proxy VFS mechanism and proxy FSobj to represent the 
file system resource. 

A virtual file system node (vnode) mechanism is 
used to represent each directory in the global name 

45 space. The vnode mechanism is used as a mount point 
at which a VFS or proxy VFS mechanism is attached 
thereby incorporating the new file system resource into 
the global name space. 

The global mount mechanism includes an initializa- 

50 tion mechanism that generates the global name space 
initially. At system initialization, each node has a local 
name space including a number of local file system re- 
sources that are only accessible from within the node. 
The initialization mechanism gives one or more local di- 

55 rectories or local mount points a global locking capability 
that enables the local mount to be locked by any node 
in the cluster. The global locking capability turns the lo- 
cal mount point into a global mount point that is part of 
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the global name space. One or more local file system 
resources can then be mounted at a global mount point 
and, hence, become part of the global name space. 

BRIEF DESCRIPTION OF THE DRAWINGS 5 

Exemplary embodiments of the invention are de- 
scribed hereinafter, by way of example only, with refer- 
ence to the accompanying drawings, in which: 

10 

Fig. 1 is a block diagram of a distributed computing 
system incorporating the preferred embodiments of 
the present invention. 

Fig. 2A represents an exemplary global name is 
space and a file system that will be mounted into 
the global name space. 

Fig. 2B represents the global name space of Fig. 
2A after the mount of the file system is performed. 20 

Fig. 3A represents an exemplary distributed file sys- 
tem of the global name space shown in Fig. 2A. 

Fig. 3B represents an exemplary distributed file sys- 25 
tern of the global name space shown in Fig. 2B. 

Fig. 4 is a block diagram of a distributed computing 
system incorporating the preferred embodiments of 
the present invention. 30 

Figs. 5A and 5B illustrate the vnode and VFS data 
structures used in an embodiment of the present in- 
vention. 

35 

Fig. 5C illustrates the PxFobj data structure used in 
an embodiment of the present invention. 

Fig. 6 is a flow chart illustrating the steps used by 
the global mount mechanism in mounting a new file *o 
system resource into the global name space. 

Fig. 7 illustrates the distributed locking mechanism 
used in an embodiment of the present invention. 

45 

Figs. 8A - 8D illustrate by way of an example the 
global mount mechanism of Fig. 6. 

Fig. 9 illustrates the steps used by the global mount 
mechanism in unmounting a mounted file system so 
resource from the global name space. 

Fig. 10 illustrates by way of an example the global 
unmount mechanism of Fig. 9. 

55 

Fig. 11 illustrates the steps used to generate a glo- 
bal mount point in a local name space. 



Fig. 1 2 illustrates the distributed locking mechanism 
used in Fig. 11. 

DESCRIPTION OF THE PREFERRED EMBODIMENT 

Overview of the File System Data Structures 

Referring to Fig. 1 , there is shown a distributed com- 
puting system 100 including a plurality of computing 
nodes 102. Each computing node 102 represents an in- 
dependent client/server computer that is interconnected 
via a communications link 104. Each node can act as 
either a client or a server computer or both. With respect 
to a given file system resource, one node can act as the 
server computer for the resource and other nodes as 
client computers. A client computer is associated with a 
node that accesses file system resources over the com- 
munication link and a server computer is associated with 
a node that provides file system resources over the com- 
munication link. However, the classification of a client 
and server computer for a particular file system resource 
can vary over time. 

The distributed computing system 1 00 utilizes a dis- 
tributed file system that includes a local layer (Local) and 
a cluster layer (Cluster). The local layer includes a phys- 
ical file system and a vnode/VFS interface. The physical 
file system is any file system that stores file data on a 
data storage device that is local to the node. Examples 
of physical file systems can include but are not limited 
to the MSDOS PC file system, the 4.3BSD file system, 
the Sun network file system (NFS), and the like. 

The vnodeNFS interface is an interface between 
the operating system and the physical file system. The 
vnode/VFS interface accommodates multiple file sys- 
tem implementations within any Unix operating system 
or kernel. A file system can be incorporated into the ker- 
nel through the vnode/VFS interface. A vnode (i.e., vir- 
tual file node) 118 is a data structure that contains op- 
erating system data describing a particular file. A virtual 
file system (VFS) 120 is a data structure that contains 
operating system data describing a particular file sys- 
tem. 

The cluster layer represents the file system resourc- 
es that are accessible from any node within the cluster. 
It should be noted that the term "file system resource B 
as used herein represents information need to charac- 
terize a set of files, a file system, a directory or a group 
of directories. In addition, a directory can be considered 
a file. In the cluster file layer, each file system resource 
is represented as an object. A server node 102b will 
have a file system object (FSobj) 116 for each file sys- 
tem resource or file system under its control and a file 
object (Fobj) 114 for each directory that is under the 
server node's control. A client node 102a will have a 
proxy file object (PxFobj) 122 for each file that is ac- 
cessed from a remote node and a proxy file system ob- 
ject (PxVFS) 124 for each file system or resource that 
is accessed from a remote node. In the case where the 
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client and server computer are the same node, the client 
node 102a will have a file system object (FSobj) 116 and 
a file object (Fobj) 114 for a file system resource when 
the client node 102a is acting as the server for the file 
system resource and will have proxy data structures s 
when the client node 102a is acting as a client for the 
resource. The proxy file object 122 contains an object 
reference to the associated file object 1 1 4 and the proxy 
file system object 122 contains an object reference to 
the associated file system object 116. io 

The client and server communicate through remote 
procedure calls (RPCs). One or more threads associat- 
ed with a client node can access a remote file system 
resource through a remote object invocation using the 
proxy object reference in a RPC. A more detailed de- *5 
scription pertaining to the implementation of the remote 
object invocation can be found in pending U.S. Patent 

Application, serial no. entitled "A 

System and Method for Remote Object Invocation," filed 
June 19,1997, and assigned to Sun Microsystems Inc. 20 

The cluster layer represents the global name space. 
In addition, each node has a local name space repre- 
senting file systems or resources that are locally acces- 
sible only to that node. A node can incorporate one or 
more file systems into the global name space with the 25 
mount command and can remove file systems from the 
global name space with the unmount command. A more 
detailed description of the vnode and VPS interfaces 
can be found in Kleiman, Steven R., "Vnodes: An Archi- 
tecture for Multiple File System Types in Sun UNIX," 30 
Proceedings of the Summer 1 986 USENIX Conference, 
Atlanta, 1986. 

Fig. 1 illustrates the aforementioned infrastructure 
of the local and cluster layers for an exemplary file sys- 
tem containing a root directory and the file myfile.c as 35 
shown below. 

/ (root directory) myfile.c 

The local layer of the distributed file system on the 
server node 102b includes a vnode 118 representing the 
file myfile.c. Each vnode contains a reference to data 40 
that is specific to the kind of file system that it represents 
(i.e., file system specific data). A vnode contains one 
such reference. These references can vary. In Fig. 1, 
there are shown two such references 106,108 and it 
should be noted that the two references are shown for *s 
illustration purposes only. 

For example, vnode 118 can represent a file in a 
UFS file system. In this case, the vnode 118 contains a 
reference to an inode 106 that holds particular informa- 
tion on the file's representation on the data storage me- so 
dium. An inode 1 06 is used to represent a Unix File Sys- 
tem (UFS) file and is linked to the associated data stor- 
age medium 110 that stores the file. Further, vnode 118 
can represent a NFS file system. In this case, the vnode 
118 contains a reference to a mode 108 that represents ss 
a Network File System (NFS) file. The mode 108 is 
linked to a network interface 112 that is used to access 
the remote data storage medium containing the file. 
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However, it should be noted that the present invention 
is not limited to UFS or NFS file systems and other types 
of file systems can be used in this context as well. 

Alternatively, a vnode 117 can represent a remote 
file system resource where the file system specific data 
is a proxy object reference (PxFobj) that refers to vnode 
1 1 8. From the viewpoint of the operating system running 
on node 1 02a, there is no difference between vnode 1 1 7 
or any other vnode 118 existing on that node. The op- 
erating system accesses each vnode in the same man- 
ner 

The vnode 118 is linked to a VFS 120 that repre- 
sents the overall file system. A VFS 120, 121 represents 
the overall file system. Each VFS 120, 121 contains a 
reference to data that is specific to the kind of file system 
that it represents (i.e., file system specific data). For ex- 
ample, VFS 1 20 represents a particular file system and 
contains a reference to file system specific data (not 
shown). Proxy VFS 121 represents a remote file system 
resource and its file system specific data is a proxy ob- 
ject reference (PxVFS) 124 that refers to FSobj 116. 
From the point of view of the operating system running 
on node 1 02a, there is no difference between proxy VFS 
121 or any other VFS 120 existing on that node. The 
operating system accesses each VFS in the same man- 
ner. 

It should be noted that from the client node's view- 
point, the proxy file system is just another file system 
type on par with an NFS or UFS file system. However, 
internally it operates by pairing proxy vnodes 117 on 
each client node with corresponding vnodes 118 on the 
server node. The file system specific data for the proxy 
file system (i.e., the PxFobj and the PxVFS) contains 
the linkage information required to maintain the associ- 
ation with the corresponding server node. 

A more detailed description of the proxy file system 
can be found in Matena, et al., "Solaris MC File System 
Framework," Sun Microsystems Laboratories Technical 
Report SMLI TR-96-57, October 1996 which is hereby 
incorporated by reference as background information. 

The server-side cluster layer of the file system in- 
cludes a file object (Fobj) 114 representing the file, my- 
file.c. The Fobj 1 1 4 is linked to the vnode 1 1 8 associated 
with the file. In addition, there is a file system object 
(FSobj) 116 representing the file system as a whole. 

The client-side local layer of the file system includes 
a proxy vnode 117 representing myfile.c. The proxy vn- 
ode 1 1 7 is linked to a VFS 1 20 representing the file sys- 
tem associated with myfile.c. 

The proxy vnode 117 is linked to a proxy file object 
(PxFobj) 122 which contains a reference to the associ- 
ated Fobj 1 1 4 for myfile.c. The PxFobj 1 22 is associated 
with the cluster layer of the file system. The client node 
102a can access the file myf/7e.cthrough a RPC utilizing 
the object reference contained in PxFobj 122. In addi- 
tion, the VFS 121 is linked to a PxVFS 124 which con- 
tains a reference to the file system object F Sobj 1 1 6 rep- 
resenting the file system. The client node can access 
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the file system object through a RPC utilizing the object 
reference in PxVFS 124. 

The aforementioned description details some o1 the 
data structures used to support the distributed file sys- 
tem in the cluster environment of the present invention. 
A global mount mechanism is provided that utilizes 
these data structures as well as additional procedures 
and data structures to maintain a global name space for 
each node in the cluster. A brief synopsis of the global 
mount mechanism is shown in Figures 2A-2B and 3A- 
3B. 

Figs. 2A - 2B illustrate a global name space 1 34 as- 
sociated with each node 102 in the cluster. The global 
name space is a collection of file system resources, 
each containing a set of files where each file is repre- 
sented by a pathname. The pathname can include one 
or more directories that are organized in a hierarchical 
structure. Each directory can serve as a mount point for 
incorporating a new file system resource into the global 
name space. 

In this example, there is a file system 1 32 associat- 
ed with a first node 102a that will be mounted into the 
global name space 134. At the completion of the global 
mount, the global name space 134 will appear as the 
single image shown in Fig. 2B. In order for the global 
name space to appear as a single image, the file system 
is mounted into the global name space at the same 
mount point in each node concurrently. In Fig. 2B, the 
common mount point is /mnt to which the file system 
132 with root directory 2 is mounted. 

Figs. 3A - 3B illustrate the changes made to the 
cluster layer of the file system in order to mount the ad- 
ditional file system with root directory 2 into the global 
name space. Fig. 3A shows the file system with respect 
to the mount point /mnt before the mount and Fig. 3B 
shows the file system with respect to the mount point 
/mnt after the mount. 

As shown in Fig. 3A, each node 102 has a vnode 
118 or proxy vnode 117 representing the mount point, 
which in this example is the directory /mnt. The vnode 
118 or its proxy 117 are linked to the VFS associated 
with file system containing the mount point. For each 
client node 102a, 102c, the proxy vnode 117 for the 
mount point is linked to a proxy file object 122. For the 
server node 1 02b associated with the mount point, there 
is a file object Fobj 114 linked to the vnode 118. There 
is also a file system object 1 1 6 representing the file sys- 
tem which is linked to a corresponding VFS 120. This 
infrastructure is in place before the file system with root 
directory 2 is mounted into the global name space. 

The file system with root directory 2 is mounted in 
the global name space at the mount point /mnt. The file 
system after the mount is shown in Fig. 3B. The proxy 
vnode 117 for the mount point /mnt is linked by a 
mounted_here pointer 181 to the VFS representing the 
file system containing root directory 2 The proxy VFS 1 51 
is linked by a covered_vnode pointer 190 to the mount 
point vnode. In each client node, the VFS 151 is linked 



to a PxVFS 124. The server node includes a file system 
object FSobj 156 linked to a corresponding VFS 150. 

In addition, the global mount mechanism permits a 
process to unmount a file system from the global name 
5 space. The unmount procedure is performed such that 
the file system is unmounted from the same mount point 
and at the same time from each node in the cluster. The 
unmount of the file system having root directory 2 shown 
in Fig. 3B will result in the file system representing the 
10 global name space shown in Fig. 3A. 

The aforementioned overview has presented the in- 
frastructure of the distributed file system and the global 
mount mechanism. A more detailed description of the 
global mount mechanism and its operation is presented 
is below. 

System Architecture 

Fig. 4 illustrates the distributed computing system 

20 1 00 embodying the present invention . A cluster of nodes 
102 is interconnected via a communications link 104. 
Each node does not share memory with the other nodes 
of the cluster. The communications link 104 generically 
refers to any type of wire or wireless link between com- 

25 puters, such as but not limited to a local area network, 
a wide area network, or a combination of networks. The 
client/server computers use the communications link 
104 to communicate with each other. 

Each of the nodes 102 contains a number of data 

30 structures and procedures used to support the distrib- 
uted file system and the global mount mechanism. Each 
node includes an operating system or kernel 160. In a 
preferred embodiment , the operating system 160 is the 
Solaris MC operating system, which is a product of Sun 

35 Microsystems, Inc. Background information on the So- 
laris MC operating system can be found in "Solaris MC: 
A Multi-Computer OS," Technical Report SMLI TR- 
95-48, November 1995, Sun Microsystems, which is 
hereby incorporated by reference. 

40 The Solaris MC operating system is a UNIX based 
operating system. As such, in describing the present 
technology, UNIX terminology and concepts are fre- 
quently used in describing the present invention. How- 
ever, this is for illustration purposes and is not to be con- 

45 strued as limiting the invention to this particular operat- 
ing system or file system design. 

In addition, each node 102 can contain the follow- 
ing: 

50 • an operating system 160; 

• one or more file system (FS) factory procedures 166 
that are used to instantiate a file system or VFS on 
an invoking node; 

• a VFS table 168 that stores one or more VFS 120, 
55 121, 150, 151 data structures; 

• a PxFobj table 169 that stores one or more PxFobj 
data structures; 

• a vnode table 170 that stores one or more vnode 
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117, 118 data structures; 

• a proxy VFS (PxVFS) table 171 that stores one or 
more PxVFS data structures; 

• a file object (Fobj) table 1 72 that stores one or more 
file objects (Fobj); s 

• a file system object (FSobj) table 173 that stores 
one or more file system objects (Fobj); 

• a file system resource configuration database 174 
that stores all the file system (FS) factory proce- 
dures 166 within the cluster. The database 174 is io 
accessed using a key including the file system type 
and file system resource that retrieves the FS fac- 
tory procedure 166 corresponding to the requested 
resource; 

• one or more cache objects 1 76. Each cache object 1& 
1 76 is associated with a PxFobj and a server-side 
provider object 177. A method associated with the 
cache object 176 is used to access a particular glo- 
bal lock 182; 

• one or more provider objects 1 77. Each provider ob- 20 
ject 1 77 is associated with a Fobj and a client-side 
cache object 176. A method associated with the 
provider object 177 is used to enable read or write 
access to a particular vnode's global lock 182; 

• a mount list 1 78 that records the file systems (VFS) 25 
within the cluster; 

• one or more local mount cache objects 1 65 that are 
used to provide a local vnode with a global locking 
capability; 

• a proxy local mount object table 1 67 that stores one 30 
or more proxy local mount objects; 

• a VFS list client object 179 that interacts with the 
VFS list server object 164; 

• a global mount procedure 1 75 that is used to mount 
and unmount resources; 35 

• a bootstrap procedure 1 59 that is used to provide a 
local vnode with a global locking capability; 

• as well as other data structures and procedures. 

One of the nodes is designated a list server node *o 
102s since it tracks information regarding the globally 
mounted file systems in the cluster In addition to the 
above mentioned data structures and procedures, the 
list server node 102s stores a global mount list 162 de- 
lineating all the globally mounted file systems in the clus- 45 
ter and stores a VFS list server object 1 64 that maintains 
the global mount list 162 and generates the requisite in- 
frastructure needed to support a mounted resource in 
the global name space. 

Fig. 5A details the components of a vnode 1 1 8 and so 
proxy vnode 117. A vnode 118 or proxy vnode 117 is 
used to represent each file and directory in the distrib- 
uted file system. Each proxy vnode 117 or vnode 118 
can include: 

55 

• a pointer 180 to a VFS representing the file system 
associated with the file or directory the vnode or 
proxy vnode represents; 



• a mountedjnere pointer 181 linking the vnode or 
proxy vnode to a VFS representing a file system that 
uses the vnode or proxy vnode as its mount point; 

• a local lock 183 that allows the vnode or proxy vn- 
ode to be locked by processes local to the node; 

• a method array pointer 1 84 that points to one or 
more methods used to perform operations on the 
vnode 118 or proxy vnode 117. An example of one 
such method is the lookup method that is used to 
find or generate a vnode; 

• a data pointer 1 85 that points to file system specific 
data pertinent to the file the vnode represents. 
When the vnode is a proxy, the file system specific 
data is a PxFobj which includes a pointer to the as- 
sociated Fobj; 

a flags array 1 86 including a proxy flag 1 87 indicat- 
ing whether or not the vnode is a proxy and a global 
locking flag 188 indicating whether or not the vnode 
118 that is otherwise part of the local name space 
should use the global locking facilities when it is 
locked or unlocked; 

• as well as other data. 

Fig. 5B details the components of a VFS 1 20, 1 2 1 . 
A VFS 120, 121 is used to represent each file system. 
Each VFS can include: 

• a covered_vnode pointer 1 90 that points to the vn- 
ode that is the global mount point for the file system 
associated with the VFS; 

• a data pointer 1 92 that points to file system specific 
data. When the VFS is a proxy, this file system spe- 
cific data is a PxVFS object, which in turn contains 
a reference to the file system object on the server; 

• as well as other data. 

Fig. 5C details the components of the PxFobj 122 
which can include a global lock 182 as well as other da- 
ta. The global lock 182 is used to perform atomic oper- 
ations on a vnode. 

The system architecture including the data struc- 
tures and procedures used to support the global mount 
mechanism has been described above. Attention now 
turns to the operation of the global mount mechanism. 
There are two central aspects to the global mount mech- 
anism. The first is the manner in which the global mount 
mechanism is used to mount a new file system into an 
existing global name space. The second is the mecha- 
nism for establishing the global name space initially. The 
operation of the global mount mechanism in an existing 
global name space is described first, followed by a de- 
scription of the manner in which the global name space 
is generated initially. 

Mounting and Unmounting a File System in the Global 
Name Space 

The global mount mechanism mounts a file system 
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into the global name space at a common mount point 
on each node of the cluster. In addition, the mount oc- 
curs concurrently on each node in the cluster. A global 
lock is used to lock the proxy vnodes representing the 
mount point on all nodes while the mount mechanism is 
in operation. This ensures that no other process can al- 
ter the mount point while the global mount or unmount 
operation is proceeding. 

Similarly ; the global mount mechanism unmounts a 
file system from the global name space from a common 
mount point concurrently in each node in the cluster. The 
global lock is used to lock the vnode of the mount point 
when the unmount operation is in operation. 

Furthermore, the global mount mechanism allows 
one node to mount a file system whose resources reside 
in another node. This can occur, for example, when the 
NFS protocol stack is constrained to run on a single des- 
ignated node, or when a block special file whose media 
contains a UFS file system is usable only from the nodes 
where the hardware is connected. The global mount 
mechanism determines which node is appropriate to be 
the server for the resource and then utilizes the server's 
file system factory to instantiate the resource as a file 
system object. After the factory has instantiated the file 
system, the global mount mechanism uses the list serv- 
er to add the file system to the list of globally mounted 
file systems as well as notify each client node of the new 
globally mounted file system. Each client node in turn 
will set up the requisite data structures needed to mount 
the file system at the mount point concurrently. 

Fig. 6 illustrates the steps used to mount a file sys- 
tem in the global name space. A user associated with a 
process issues a global mount command at a requesting 
node (step 200). The global mount command can have 
the following syntax: 

mount ~g <-F file system type> <resource> 
<mount point> where the -g indicates that the re- 
source is to be mounted into the global name space, 

the -F indicates that the following argument is a file 
system type that can take one of many possible val- 
ues; two commonly used values are: 
ufs, indicating a Unix file system or 
nfs, indicating a network file system, 
the resource field indicates the file system resource 
that will be mounted, and 

the mount point field indicates the mount point or 
location in the global name space where the re- 
source is to be mounted. 

Upon receiving a global mount command, the global 
mount procedure 175 executes a lookup method to find 
the proxy vnode 117 associated with the mount point in 
the requesting node (step 202). If no proxy vnode 117 
exists for the mount point in the requesting node, the 
lookup method will generate a proxy vnode 117 for the 
mount point and link it to the VFS 121 representing its 
containing file system. The method will also determine 



the server for the mount point and request that the serv- 
er generate a file object Fobj 11 4 for the mount point. In 
response to this request, the server will, if as yet non- 
existent, generate the file object Fobj 114 and link it to 
s the corresponding server-side vnode 118 representing 
the mount point. In addition, the server will generate a 
provider object 177 for the requesting node. An object 
reference to the Fobj 114 is then returned to the request- 
ing node and is stored in the proxy file object PxFobj 

10 122. The PxFobj 122 is then linked to the proxy vnode 
117 of the mount point. Additionally, information is also 
returned to the requesting node for it to generate an as- 
sociated cache object 176. 

Once the mount point vnode 118 is generated on 

f s the server and its corresponding proxy 1 1 7 is generated 
in the requesting node, the global mount procedure 175 
acquires the vnode's global lock 182 for write access 
(step 204). This is accomplished by using a distributed 
locking scheme that employs a single writer/ multiple 

20 readers protocol. The locking scheme allows the mount 
or unmount operation to be performed on the mount 
point proxy vnode 117 while simultaneously blocking 
conflicting operations on the same mount point vnode 
117 on the other nodes. The locking scheme is based 

25 on the distributed locking scheme recited in the "Deco- 
rum File System Architectural Overview," Proceedings 
of the USENIX Summer Conference 1990, pgs. 151- 
163, which is hereby incorporated by reference as back- 
ground information. 

30 The object of the locking scheme is to ensure that 
on ly one process has write access to the vnode 1 1 7 , 1 1 8 
at a time or that multiple processes have concurrent 
read access to the vnode 117, 118 at a time. This 
scheme is implemented using cache 176 and provider 

35 1 77 objects. The cache object 1 76 is used by the proxy 
file object PxFobj 122 to request the global lock 182. 
The global lock 182 can be acquired for either read or 
write access. The provider object 1 77 is used to coordi- 
nate the requested read or write access with the other 

40 nodes. 

Fig. 7 illustrates the distributed locking scheme. All 
file access is performed through a proxy vnode 117. In 
order to perform an operation coherently on a vnode, 
the distributed locking scheme locks the collection of 

45 proxy vnodes 117 across all nodes to accomplish the 
coherent operation. 

Each client node 102a, 102c has a PxFobj 122 as- 
sociated with the file object 1 1 4 for the mount point. The 
PxFobj 122 has a cache object 176 that is used to re- 

50 quest the file's global lock for either read or write access. 
The server 102b for the file has a provider object 177 
for each client node. A provider object 177 is paired with 
a respective client-side cache object 176. The provider 
objects 177 are associated with the file object Fobj 114 

55 associated with the vnode 118 for the mount point. A 
request for the vnode's global lock 182 is made using 
the PxFobj 1 22 to the cache provider 1 76. The request 
is transmitted from the cache provider 1 76 to the respec- 
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tive server-side provider 177 that coordinates with the 
other providers 177 to determine whether the access 
should be granted or should be blocked. 

For a mount or unmount operation, write access is 
necessary. The request is made to the associated cache s 
object 1 76 which in turn calls the corresponding provider 
177. The provider 177 consults with all other providers 
177 and determines whether or not the access can be 
granted. 

The locking protocol allows a write access when no 10 
other read or write access is active. Alternatively, multi- 
ple read accesses can occur concurrently when there is 
no write access active. Thus, only one write access is 
allowed at a time. If another provider 177 has been 
granted either read or write access, an attempt will be is 
made to invalidate the access. If this cannot be done, 
the requesting provider 177 will wait until the outstand- 
ing write access is completed before it is granted write 
access. 

Referring back to Fig. 6, once the vnode's global 20 
lock 182 has been acquired for write access, the global 
mount procedure 175 determines the appropriate node 
that should become the server for the resource (step 
206). At times the node servicing the global mount com- 
mand may not be the appropriate server for the resource 25 
that will be mounted. This may be attributable to several 
different factors, such as constraints imposed by the re- 
source. As noted above, disparity in resource distribu- 
tion can occur. 

In order to accommodate this problem, the global 30 
mount procedure 175 determines which node is best 
suited to act as the server for the resource (step 206). 
This is accomplished by querying the cluster file system 
resource database 174 for the appropriate file system 
factory (fs_factory) procedure 166. Each fsjactory pro- 35 
cedure 166 is associated with a particular node and 
used to instantiate a file system and VFS. The global 
mount procedure 175 then invokes the appropriate 
fsjactory procedure 166 to generate a VFS 150 in the 
server node for the mounted resource and a corre- 40 
sponding file system object FSobj 156 (step 206). 

Next, the global mount procedure 175 calls the list 
server 102s with information about the newly generated 
FSobj 1 56 and the resource used to instantiate it (step 
206). The list server 102s adds the newly generated *s 
FSobj 156 to the cluster-wide global mount list 162 (step 
206). In addition, the list server 102s contacts each node 
1 02 in the cluster and informs it of the mounted resource 
and mount point and transmits an object reference to 
the corresponding FSobj 156 (step 206). Each node so 
102, in turn, searches for a proxy VFS 151 for the FSobj 
156 and a proxy vnode 1 17 for the mount point Fobj 114 
(step 206). If these proxies do not exist in the node, they 
are created. When the proxy vnode 117 for the mount 
point is created, the file system specific data or PxFobj ss 
122 is generated as well. Similarly, when the proxy VFS 
151 is created, the file system specific data or PxVFS 
124 is generated as well. Then the node 102 updates 



the mount list 178 with the proxy VFS 151 (step 206). 

The next step is to splice the resource into the global 
name space at the mount point (step 208). The global 
mount procedure 175 performs this task in each node 
by setting the mounted_here pointer 181 of the mount 
point's proxy vnode 1 1 7 to the proxy VFS 1 51 represent- 
ing the mounted resource. The covered_vnode pointer 
190 of the proxy VFS 151 representing the mounted re- 
source is linked to the vnode 117 of the mount point. 
When the mounted_here pointer 181 is set, this indi- 
cates that a file system has been mounted at the mount 
point. Finally, the global lock of the mount point's vnode 
117,118 is released (step 210). 

Figures 8A - 8D illustrate the mount of the file sys- 
tem 1 32 shown in Fig. 2 into the global name space. 
Referring to Fig. 8A, a client node 1 02a receives a global 
mount command specifying that an NFS file system is 
to be mounted in the global name space at the mount 
point /mnt. The requesting node does not have a vnode 
118 or proxy vnode 117 for the mount point. However, 
on server node 102b, the Fobj 114 already exists. 

Fig. 8B illustrates the file system 100 after the client 
node 102a looks up the mount point (step 202). The 
lookup method generates a proxy vnode 117 for the 
mount point on the client node 102a. A reference to the 
Fobj 114 is transmitted to the client node 102a and 
stored in a newly created proxy file object PxFobj 122a. 

Fig. 8C illustrates the file system 100 after the list 
server 102s (not shown) generates the necessary infra- 
structure to instantiate a file system from the designated 
file system resource (step 206). A VFS 1 50 is generated 
in the server node 102b for the file system associated 
with the mounted resource and a corresponding FSobj 
1 56. The list server 1 02s passes an object reference to 
the FSobj 156, an object reference to the mount point 
object 114, and a copy of the arguments to the mount 
command to each client node. In turn, each client node 
1 02 generates a proxy vnode 1 1 7 for the mount point as 
well as a proxy VFS 151 representing the mounted re- 
source and file system specific data 124 for that proxy 
VFS 151. 

Fig. 8D represents the file system after the mounted 
resource is spliced into the global name space (step 
208). The proxy vnode 117 for the mount point has its 
mounted_here pointer 181 linked to the VFS 151 repre- 
senting the new file system, and the VFS 151 has its 
covered_vnode pointer 190 set to the proxy vnode 117 
representing the mount point. 

Fig. 9 illustrates the steps used by the global mount 
mechanism 175 to unmount a mounted resource from 
the global name space. Fig. 10 illustrates an exemplary 
unmount of the mounted file system illustrated in Figs. 
8A - 8D. A process associated with a node receives a 
global unmount command (step 212). The global un- 
mount command can have the following syntax: 

unmount <mount point> where the mount point 
field indicates the mount point or location in the global 
name space where the file system resource is to be un- 
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mounted from. 

The global mount mechanism 175 will then look up 
the proxy vnode of the unmounted resource utilizing the 
same steps described above (step 214). In Fig. 10, the 
unmounted resource's root directory is represented by 
proxy vnode 236. After finding the unmounted re- 
source's root vnode 236, the global mount mechanism 
175 obtains the associated VFS 151 through the VFS 
pointer 180. The VFS's 151 covered_vnode pointer 190 
is traversed to vnode 117, which represents the mount 
point. The mount point vnode 117 is then locked in ac- 
cordance with the locking mechanism described above 
(step 216). 

Once the mount point vnode 117 is locked, the list 
server 1 02s is called to have each node unsplice the 
file system resource from that node's mount point proxy 
vnode 1 1 7 (step 21 8). This is performed by deleting the 
contents of the mounted_here VFS pointer 181 (step 
218). 

The global mount mechanism 175 will then delete 
the infrastructure used to support the unmounted re- 
source (step 220). The list server 102s will delete the 
VFS 121 from the global mount list 162 and inform the 
other client nodes.to delete their VFS 121 and PxFobj 
122 data structures as well. 

The global lock 1 82 representing the mount point is 
then released in a similar manner as was described 
above (step 222). 

The above description details the manner in which 
the global mount mechanism mounts or unmounts a file 
system to and from the global name space. Attention 
now turns to the manner in which the global name space 
is generated initially. 

Generating the Global Name Space 

A characteristic of the global name space is that 
each directory in the global name space can serve as a 
global mount point. New file systems can be incorporat- 
ed into the global name space at a global mount point. 
A distinguishing feature of a global mount point is that 
it can be locked globally. 

Initially, when each node in the cluster "boots up", 
the global name space does not exist. Instead, each 
node has a set of local vnodes representing the file re- 
sources associated with a particular node. The first step 
in incorporating these local vnodes into the global name 
space is to provide each local vnode with a global lock- 
ing capability. The global locking capability allows a local 
vnode representing the same file resource in each node 
to be locked concurrently. Once the local vnode ac- 
quires a global locking capability, the local vnode can 
be used as a global mount point and a mount can be 
established there that becomes a part of the global 
name space. 

In order for a local vnode to acquire the global lock- 
ing capability, a distributed locking mechanism is gen- 
erated to prevent two or more nodes from establishing 



the global lock capability for the same local vnode at the 
same time. Thus, only one node needs to perform the 
global namespace initialization procedure. 

One node will request that a local vnode or mount 
5 point be granted global locking capability. The list server 
102s acts as the server for the local vnode or mount 
point and generates a local mount object 189 to repre- 
sent the mount point in the list server 1 02s. The list serv- 
er 1 02s also generates a local mount provider object 

10 161 for each node in the cluster. The list server 102s 
then visits each node in the cluster, providing the node 
with enough information for the node to construct a 
proxy local mount (pxlocalmnt) object 167 and a local 
mount cache object 165 in the client node. The cache/ 

is provider object pair is then used to lock the mount point 
that is distributed in each node in the cluster. As each 
node is visited, locking responsibility for the vnode rep- 
resenting the mount point on that node is transferred 
from the vnode itself to the cache/provider pair newly 

20 associated with that vnode. 

Figs. 11 and 12 illustrate the steps used to provide 
a mount point with a global locking capability. These 
steps can be performed by an initialization procedure 
159. A node 102 contacts the list server 102s with a re- 

25 quest to provide a mount point with global locking capa- 
bility (step 230). The list server 102s creates a local 
mount object 189 representing the mount point in the 
list server 102s and a local mount provider object 161 
for each node in the cluster (step 232). 

30 The list server 102s then "visits" each client node 
by calling the list client on each node in turn, starting 
with the client node 102a that initiated the request (step 
234). The list server 102s provides information to each 
client node 102a, 102c which the client node's 102a, 

35 102c list client uses to perform the following tasks. The 
client node 102a, 102c will perform the pathname 
lookup method on the mount point, thereby generating 
a vnode 118 for the mount point. The list server 102s 
will send the client node 1 02a, 1 02c an object reference 

40 to the local mount object 1 89 which the client node 1 02a, 
102c uses to construct a proxy local mount object 167 
(pxlocalmnt). The pxlocalmnt 167 is linked to the vnode 
118 representing the mount point. In addition, the client 
node 102a, 102c generates a local mount cache object 

<5 165 that is paired to a corresponding local mount pro- 
vider 161 (step 234). 

Next, the local lock 183 associated with the mount 
point's vnode 1 1 8 is acquired. This is performed in a sim- 
ilar manner as was described above with respect to Fig- 

so ures 6 and 7 (step 234). 

Once the local lock 183 is acquired, the global lock 
flag 188 in the mount point's vnode 118 is turned on. 
When the global lock flag 188 is turned on or set, this 
indicates that the associated mount point has acquired 

55 the global locking capability. The requesting client node 
102a is then given write access to the global lock 182. 
Lastly, the local lock 183 is released (step 234). 

This procedure is performed in each client node 
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102a, 102c (step 234). At the completion of this proce- 
dure, the mount point has acquired global locking capa- 
bility. 

It then can be used as a mount point for a new file 
system resource . In this case, the global mounting pro- 
cedure 175 described above can be used to mount a 
new file system resource at the mount point. 

Accordingly, the global name space is generated by 
creating global mount points from the local vnodes in 
each node's bcal namespace. New file system resourc- 
es can then be mounted into the global name space at 
these newly created global mount points. The progres- 
sion of these steps will generate the global name space. 

Alternate Embodiments 

While the present invention has been described 
with reference to a few specific embodiments, the de- 
scription is illustrative of the invention and is not to be 
construed as limiting the invention, various modifica- 
tions may occur to those skilled in the art without depart- 
ing from the scope of the invention. 

The present invention is not limited to the computer 
system described in reference to Fig. 1 . It may be prac- 
ticed without the specific details and may be implement- 
ed in various configurations, or makes or models of dis- 
tributed computing systems, tightly-coupled processors 
or in various configurations of loosely-coupled micro- 
processor systems. 

Further, the method and system described herein- 
above is amenable for execution on various types of ex- 
ecutable mediums other than a memory device such as 
a random access memory. Other types of executable 
mediums can be used, such as but not limited to, a com- 
puter readable storage medium which can be any mem- 
ory device, compact disc, or floppy disk. 



Claims 



A method for maintaining a global name space in a 
computing system having a plurality of nodes inter- 
connected by a communications link, the global 
name space representing a plurality of global file 
system resources accessible from each node, each 
global file system resource including a plurality of 
file resources, the global name space including glo- 
bal pathnames with each global pathname repre- 
senting one of the global file resources, each global 
pathname including one or more global directories, 
wherein the global name space is distributed over 
the nodes; 

the method comprising the steps of: 

(a) providing a first file system resource for 
mounting in the global name space at a desig- 
nated first mount point selected from the global 
directories; 
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(b) concurrently locking the first mount point in 
each node; 

(c) mounting the first file system resource at the 
first mount point in each node; and 

(d) concurrently unlocking the first mount point 
in each node. 

2. The method of claim 1 , comprising: 

providing a second file system resource for un- 
mounting from the global name space at a sec- 
ond mount point selected from the global direc- 
tories; 

concurrently locking the second mount point in 
each node; 

unmounting the second file system resource 
from the second mount point in each node; and 
unlocking the second mount point in each node. 



20 3. The method of claim 1 or claim 2, wherein: 



the computing system includes a plurality of lo- 
cal name spaces, each local name space rep- 
resenting local file system resources associat- 
ed with one of the nodes, each local file system 
resource representing local file resources, the 
local name space including local pathnames, 
each local pathname representing one of the 
local file resources and including one or more 
local directories; 
the method including: 
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5. 



enabling one or more of the local directo- 
ries with a global locking capability that en- 
ables the enabled local directory to be 
locked by each of the nodes, the enabled 
local directory being a global directory in 
the global name space; and 
concurrently mounting one or more of the 
local file system resources into the global 
name space at a select one of the global 
directories, each mounted local file system 
resource being a global file system re- 
source. 

The method of claim 3, wherein the concurrently 
mounting step includes the steps of: 

(i) concurrently locking the select global direc- 
tory in each node; 

(ii) mounting a first local file system resource at 
the select global directory in each node; and 

(iii) concurrently unlocking the select global di- 
rectory in each node. 

A computer system for maintaining a global name 
space, the system including a plurality of nodes in- 
terconnected by a communications link, the system 
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comprising: 

a plurality of global file system resources ac- 
cessible from each of the nodes, each global 
file system resource representing global file re- s 
sources; 

a global name space representing the global 
file system resources and including a plurality 
of global pathnames that each represent one of 
the global file resources, each global pathname 10 
including one or more global directories, each 
global directory being a global mount point to 
which a new file system resource can be 
mounted, the global name space distributed 
over the nodes; is 
a first locking mechanism, distributed over the 
nodes, for concurrently locking a same global 
directory in each node; and 
a global mount mechanism, distributed over the 
nodes, for mounting a new file system resource 20 
into the global name space at a specified mount 
point in each node. 

6. The system of claim 5, comprising: 

25 

a plurality of vnode mechanisms, each vnode 
mechanism representing in each node a global 
directory associated with the global name 
space; 

a plurality of virtual file system (VFS) mecha- 30 
nisms, each VFS mechanism representing in 
each node a file system resource associated 
with the global name space; and 
the global mount mechanism establishing a 
VFS mechanism for a newly mounted file sys- 35 
tern resource in each node, establishing a vn- 
ode mechanism in each node for a global 
mount point, and linking the vnode mechanism 
in each node to the VFS mechanism represent- 
ing the new file system resource. 40 

7. The system of claim 6, comprising: 

a plurality of file system objects, each file sys- 
tem object (FSobj) representing a file system 
resource in a select node; 
a plurality of proxy file system objects, each 
proxy file system object (Px FSobj) used to ref- 
erence a corresponding FSobj; and 
the global mount mechanism designating one so 
of the nodes as a server node for a newly 
mounted file system resource, the server node 
generating a FSobj for the newly mounted file 
system resource, linking the FSobj to a corre- 
sponding VFS representing the newly mounted ss 
file system resource and passing an object ref- 
erence to the newly mounted file system re- 
source to all other nodes, each node receiving 



the object reference generating a proxy file sys- 
tem object (PxFSobj) from the object reference 
and linking the PxFSobj to a corresponding 
VFS representing the newly mounted file sys- 
tem resource in the respective node. 

8. The system of any one of claims 5 to 7, comprising: 

the global mount mechanism having means 
for unmounting a mounted file system resource 
from the global name space, the global mount 
mechanism locating a same mount point in each 
node from which the mounted file system resource 
is unmounted, concurrently locking the same mount 
point in each node, unmounting the mounted file 
system resource from the same mount point in each 
node, and unlocking the mount point in each node. 

9. The system of any one of claims 5 to 8, comprising: 

a plurality of local name spaces, each local 
name space associated with a particular node 
and representing local file system resources 
associated with the particular node, each local 
file system resource representing local file re- 
sources, the local name space including local 
pathnames, each local pathname representing 
one of the local file resources and including one 
or more local directories; and 
an initialization mechanism that enables a first 
local directory with a global locking capability, 
thereby making the enabled local directory a 
global mount point, and that mounts a local file 
system resource into the global name space at 
a global mount point. 

10. The system of claim 9, further comprising: 

a second locking mechanism distributed in 
each node, the second locking mechanism 
having a capability to concurrently lock a same 
local directory in each node; and 
the initialization mechanism using the second 
locking mechanism to lock the first local direc- 
tory in each node to enable the first local direc- 
tory with the global locking capability. 
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